Pangloss: A Knowledge-based Machine Assisted Translation Research Project - Site 2

نویسنده

  • David Farwell
چکیده

are developing a Trans-lator's Workstation to assist users in the translation of newspaper articles in the area of finance from Spanish or Japanese into English. At its core is a multi-engine MT system, consisting of a knowledge-based, interlin-gum system, an example-based system, and an extensive glossary and bilingual dictionary system. Results from all systems are combined in a chart structure which selects the most reliable and complete translation. The KBMT system consists of a source language analysis component, a mapper, and a target language generation component. During the first two years of the project, the CRL's objectives were to develop tools for constructing lexical items and ontological entries automatically from on-line resources, to develop the Spanish analysis component, and, jointly with CMT and ISI, to establish the infrastructure for the three site project and develop the formats and initial content of the interlingua, the ontology, and the knowledge base. The second phase of the project is also for two years, and we are currently in the first six months of this phase. CRL's responsibilities continue as before, with primary responsibility for Spanish analysis and joint planning of inter-site cooperation, but with the addition of the task of fulfilling knowledge-acquisition needs for all three sites, both automatic and manual. RECENT RESULTS In analysis, three new modules (a proper name recog-nizer, a clause boundary identifier and a syntactic dependency analysis module) have been added during the past year. As a result, full sentence throughput is now possible within the KBMT system. Acquisition work is continuing both with respect to integrating sense tokens into the growing ontology and in reviewing and increasing the Spanish vocabulary in the system. The analysis system begins with a dictionary-based part-of-speech tagger, followed by a component which groups the tagged text into small syntactic chunks. Chunks which are tagged as proper nouns are sent to the proper name recognizer for categorization. All chunks are then analyzed: semantic/lexical information is accessed and incorporated into the representation. These smaller constituents are then grouped into clause-level groups, which are then further analyzed to produce ranked possible syntactic dependency structures. With respect to knowledge acquisition, CRL is currently integrating sense tokens (concepts) into the Ontology Base. These sense tokens are drawn both from Long-man's Dictionary of Contemporary English (LDOCE) and from Collins Spanish-English/English-Spanish Dictionary. Work is underway to provide a large set of tagged Spanish texts both for work within the project …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Workstation Substrate of the Pangloss Project

Pangloss is a new knowledge-based machine translation project carried out jointly by the Center for Machine Translation at Carnegie Mellon University, Computing Research Laboratory of New Mexico State University and Information Sciences Institute of the University of Southern California. One of the distinguishing features of systems built in this project is that they uniformly aim at high-quali...

متن کامل

In-Depth Knowledge-Based Machine Translation

The development of ap integrated knowledge-based machine-aided translation system called PANGLOSS in collaboration with the Center for Machine 'Ikanslation (CMT) at CMU and the Computing Research Laboratory (CRL) at New Mexico State University. The IS1 part of the collaboration is focused initially on providing the system's output capabilities, primarily in English and then in other languages, ...

متن کامل

Pangloss: A Knowledge-based Machine Assisted Translation Research Project - Site

are developing a Translator's Workstation to assist a user in the translation of newspaper articles in the area of finance (mergers and acquisitions) in one language (Spanish initially) into a second language (English). At its core is a multilingual, knowledge-based, interlingual, interactive , machine-assisted translation system consisting of a source language analysis component, an interactiv...

متن کامل

PANGLOSS: Knowledge-Based Machine Translation

The goals of the PANGLOSS project are to investigate and develop a new-generation knowledge-based interlin-gual machine translation system, combining symbolic and statistical techniques. The system is to translate newspaper texts in arbitrary domains (though a specific financial domain is given preference) to as high quality as possible using as little human intervention as possible. The projec...

متن کامل

Pangloss: A Machine Translation Project

The project involves three sites (NMSU, USC, CMU) and is devoted to enhancing the state of the art in machine translation of natm'al language texts. Pangloss uses a hybrid, multi-engine approach, though knowledge-based machine translation takes a majority of resources. Types of work in the knowledge-based direction include: • continuing development of a set of knowledge acquisition tools and ut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994